home *** CD-ROM | disk | FTP | other *** search
-
- I have printed off the recent discussion on the new
- HTTP, HTML and MIMe and UDIs and done what I can
- to disentangle it all in my mind. I will reply
- in one message, becase many of the points are linked.
- I know this should be hypertext, with references but
- (a) I am away from home and (b) we don't yet have a
- universal mail/news archive server running to link to.
-
- HTTP and HTML
-
- First of all, Jean-Francois <jfg@dxcern.cern.ch>
- points out very properly that the enhaced HTTP
- protocol and the enhanced HTML spec are quite
- separate things, and should be specified separatedly.
- I agree wholeheartdly about all this, and
- I aplogize for muddling the levels up till now.
-
- (As a small aside, I would point out that wheras a
- HTERR file is not very useful, a HTFWD file IS.
- It is like a hypertex soft link. But I am happy to
- leave that as a separate type of file. It should
- certainly get a different extension so that it gets a
- different icon)
-
- HTTP: SGML vs ASN/1
-
- Let's look at the HTTP protocol first. Carl <barker@cernnext.cern.ch>
- is mapping out the requirements for this, and assuming that SGML
- would be a reasonable representation for it in practice.
- And so it is. When the requirements are clear,
- it would certainly be interesting to look at mapping them
- onto a z39.50 - style ASN/1 implementation. This would
- be useful for two reasons. First, the comparison would
- point out to us things in z39.50 which we might not have thought of
- which would b useful for HTTP. Second, the comparison might give
- a nice short or at least well-defined things which the WAIS
- guys might like to take into account in the next version
- of their protocol. (I demod W3 to Brewster who hadn't
- seen it before live, and was very keen that WAIS and W3
- should merge, changing the WAIS protocol if necessary.
-
- There is no reason why we shouldn't try both protocols.
- If they map well onto each other, its just a question
- of having two separate prasers at the low level, building
- the same internal structures.
-
- When we're talking about an SGML representation,
- and describe a file to come later down the link,
- I don't think we have to use the NOTATION= attribute with a notation
- type, because we won't in fact be talking about
- the notation of an SGML element.
- The format in this case is not something which the SGML
- parse is aware of.
-
- I must admit I was disappointed to learn that SGML
- didn't allow for any way of including 8 bit data. Thanks Eric
- <enag@ifi.uio.np> for your explanations.
-
-
- MIME and SGML
-
- Dan <connolly@pixel.convex.com> rightly points out
- the relevance of the coming MIME standards. There
- are several things which we must separate here, though:
-
- 1. The MIME classification of data formats
- 2. The MIME format for multi-part messages
- 3. The MIME format for rich text.
- 4. The MIME formal for external document addresses (MIME UDIs)
-
- 1. MIME classification of data formats
-
- We must do the same disentangling job which JF did
- on HTML to MIME.
-
- First of all, the MIME job of classifying data formats
- is a useful job which is ideally done by just one
- bunch of people. Ther has been some suggestion that
- the MIME classifications are not well enough defined,
- but they seem to be the best effort yet and one can only
- assume they will eveolve in the right direction. So I'd
- back the use of these for W3.
-
-
- 2. The MIME format for multi-part messages
-
- This is necessary for sending a multi-part
- document over a mail link. We have to ask ourselves
- whether it is reasonable to use over a binary link.
- Personally, my initial impression is that the MIME
- stuff, using as it does terminators such as
- --xxx-- separated by blank lines, looks more horrible
- to work with in this respect than SGML! Still we have
- the problem of restrictions on the content:
- Must not contain delimiters, limited 7 bit character set,
- line orientation, in fact all the things which email
- carries as a restriction. This is really taking on board
- a legacy of all the mail which has evolved over the years.
- Do we need that for our new ultra-fast hypertext access
- protocol?
-
- [Compare the MIME format with the rather cleaner NeXT
- Mail format which is as far as I understand simply
- a uuencoded compressed tar file of all the bits, where
- uuencoding is designed as an optimal way of getting over
- mail transport restrictions, compress does what it says
- and tar is a multipart wrapper designed for that only. Not
- standard outside unix, perhaps, but cleaner in that the
- mail formatting is done at the last minute and doesn't
- affect the other operations]
-
- If course, with HTTP2, multipart/alternative shouldn't
- be needed.
-
- Multipart for hypetext?
-
- Now, Dan not only suggests the use of this for
- multipart messages, but also suggests that a hypetext
- document shoudl necessarily contain many parts,
- one on SGML and one for each link as a MIME external document.
- This means that an SGML hypertext document can never stand
- on its own! An SGML parser will always need to have
- a MIME parser sitting just outside. I don't like
- this: I feel we have to separate these two things.
-
- Suppose that an SGML document does want to
- be sent in a MIME message and does want to
- refer to other parts of that MIME message. In that case,
- it seems reasonable to have a format for that.
- However, when an SGML document is seen by itself, and
- refers to a news message for example, then there is
- no resaon for it not to be able to contain a
- complete reference within itself.
-
- When SGML documents include other files, then
- the SYSTEM value is typically a file name.
- It is a reeference to something outside. The
- precedent is set that SGML documents are allowed
- to refer to things outside.
-
- I think part of you objection, Dan is based on
- a dislike of the UDI syntax -- which I'll come to later.
-
- 3. The MIME format for rich text.
-
- Here, I am not so impressed. Basically, the MIME
- people are at the same level that we were before we started
- this cleanup, that they have SGML-LIKE stuff which isn't SGML.
- As its not difficult to make it SGML, they should do that.
- Comparing MIME's rich text and HTML, I see that
- we lack the characetr formatting attributes BOLD and ITALIC
- but on the other hand I feel that our treatment of
- logical heading levels and other structures is much more powerful
- and has turned out to provide more flexible formatting
- on different platforms than explicit semi-references
- to font sizes. This is born out by all the systems which
- use named styles in preference to explicit formatting,
- LaTeX or other macros instead of TeX, etc etc.
-
- So technically, HTML has some things to give MIME's rich
- text. Are the MIME people still open to additions?
- If not, I would suggest we add BOLD and ITALIC (or
- two emphasis styles for characters), and keep HTML
- separete from MIME's rich text, proposing it as a
- MIME text standard.
- (HP0 and HP1 were in the HTML spec but as unimplemented)
-
- 4. The MIME format for external document addresses (MIME UDIs)
-
- As Ed <emv@msen.com> says, this is a bit of a non-issue,
- as MIME addersses and currnet style UDIs map onto
- each other. However, we have to agree on a "concrete
- syntax" (or two... :-) in the end.
-
- It's like the difference between an x400 style mail address
- generated from an internet address, and that internet address.
- Which do you prefer
-
- timbl@zippy.lcs.mit.edu
-
- where the sections of the domain name are defined
- to have no semantics at all, or
-
- S=timbl; HO=zippy; OU=lcs; O=MIT; SECTOR=edu
-
- (this is not real x400 - don't use it!) or
-
- user=timbl
- host=zippy
- group=lcs
- organization=mit
- sector=education
-
- You say, Dan, that you "don't think [UDIs] work".
- Do you mean people don't use them in all correspondance?
- Well, what DO they use? They use ange-ftp addresses
- for FTP (like info.cern.ch:/pub/www/doc/*.ps),
- which are even more terse than UDIs! They use news
- message-ids which are UDIs.
-
- Let me say that I personally don't much care about the
- arbitrary punctuation. There are a few things, though,
- which are important:
-
- - The thing should be printable 7-bit ASCII.
-
- Unlike arbitrary document formats,
- UDIs must be sendable in the mail
-
- - White space should not be significant. I would
- accept the presence of some arbitrary white space
- as a delimiter, but one cannot distinguish between
- different forms and quantities of white space.
- This is because things get wrapped and unwrapped.
-
- Dan, you object to UDIs because they don't
- contain white space. But that is purely so that
- to CAN wrap them onto several lines and still
- recuperate them. You can put white space
- in but it shouldn't mean anything. (This is not possible
- in W3 as is but it is in the UDI document)
-
- I don't see why you say they
- can't be put as an SGML attribute. They are just
- text strings. They will be quoted of course
- (Yes, I know the old NeXT browser doesn't quote them)
- Is that not allowed? What are the problem characters?
- If there SGML problem characters in the UDI spec, they
- probably are ruled out of SGML for a reason.
-
- (I recently saw in a galley proof of an article in which
- our mail adress had been hypernated! UDIs must be
- squeezable into 2 inch columns.)
-
- There is a sematic difference between a tagged
- list and a punctuation-divided set, and that is that
- the former has defined semantics but the latter doesn't and
- can therefore be extended more easily. I suggest that tagging
- could be used for the four bits of an address
- that must be separable by all sides, which are
- limited in number (4). Within those bits, the string should
- be transparent as the protocol does not require
- every party to understand the innards.
-
- The bits are
- MIME Used by
-
- name space: ACCESS Used by client
-
- server details: HOST, PORT used by client, protocol-dependent
-
- local doc id: PATH used by server only
-
- anchor id: (none) used by presntation application only
-
- It seems useful to maintain the ability to work out which
- bits are seen by whom.
-
- I only used punctation to separate these parts in the W3 UDI
- because people like internet addresses and mail addresses
- and filenames and telephone numbers and message-ids and
- room numbers and zip codes which don't have tags and
- do make do with punctuation. If the groundswell of
- opionion on this list is that tags are better, then
- let's use tags!
-
- Whatever we sue, it should be as quotable in an SGML
- attribute as in a MIME external reference as in a
- scribbled note or a link-pasteboard or whatever.
- (The U is for Universal, NOT Unique!)
-
- PHILOSOPHY
-
- In the W3 world, the model is of a dynamic world of
- documents which generally have some "home" or
- (or several), which can be found using sufficient
- intelligence and the help of ones friends given the UDI.
-
- A mail message has no home, and so in principle the parts
- of it have no home. When a hypertext multipart message
- (really consisting of multiple hypertext documents)
- has links between its parts they refer to each other
- within a completely isolated conetext.
-
- There are now two possibilites when the message is in fact
- archived and made readable. One is we say that the parts
- are then addressed as parts ofthe message, wherever it
- may be. The other is to say that the parts of the message
- are very likely things which had some original home.
- In that case, the message is just giving the reciever
- a copy to save him the (perhaps insurmountable) trouble
- of retrieving it. In this case the parts should be
- identified with thier original UDIs so that the
- receiver is not confsed with multiple documents which
- are in fact the same thing.
-
-
- I think that's all the comments I have on what I've read so far..
-
- Tim
- ________________________________________________________________
- Tim Berners-Lee
- World-Wide Web initiative
- CERN, 1211 Geneva 23, Switzerland timbl@info.cern.ch
- Visiting MIT: NE43-513, (617)234 6016 timbl@zippy.lcs.mit.edu
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-